Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Exploring Human and Language Model Alignment in Perceived Design Similarity Using Ordinal EmbeddingsAbstract In recent years, large language models (LLMs) and vision language models (VLMs) have excelled at tasks requiring human-like reasoning, inspiring researchers in engineering design to use language models (LMs) as surrogate evaluators of design concepts. But do these models actually evaluate designs like humans? While recent work has shown that LM evaluations sometimes fall within human variance on Likert-scale grading tasks, those tasks often obscure the reasoning and biases behind the scores. To address this limitation, we compare LM word embeddings (trained to capture semantic similarity) with human-rated similarity embeddings derived from triplet comparisons (“is A closer to B than C?”) on a dataset of design sketches and descriptions. We assess alignment via local tripletwise similarity and embedding distances, allowing for deeper insights than raw Likert-scale scores provide. We also explore whether describing the designs to LMs through text or images improves alignment with human judgments. Our findings suggest that text alone may not fully capture the nuances humans key into, yet text-based embeddings outperform their multimodal counterparts on satisfying local triplets. On the basis of these insights, we offer recommendations for effectively integrating LMs into design evaluation tasks.more » « lessFree, publicly-accessible full text available October 1, 2026
-
Abstract Creativity is increasingly recognized as a core competency for the 21st century, making its development a priority in education, research, and industry. To effectively cultivate creativity, researchers and educators need reliable and accessible assessment tools. Recent software developments have significantly enhanced the administration and scoring of creativity measures; however, existing software often requires expertise in experiment design and computer programming, limiting its accessibility to many educators and researchers. In the current work, we introduce CAP—the Creativity Assessment Platform—a free web application for building creativity assessments, collecting data, and automatically scoring responses (cap.ist.psu.edu). CAP allows users to create custom creativity assessments in ten languages using a simple, point-and-click interface, selecting from tasks such as the Short Story Task, Drawing Task, and Scientific Creative Thinking Test. Users can automatically score task responses using machine learning models trained to match human creativity ratings—with multilingual capabilities, including the new Cross-Lingual Alternate Uses Scoring (CLAUS), a large language model achieving strong prediction of human creativity ratings in ten languages. CAP also provides a centralized dashboard to monitor data collection, score assessments, and automatically generate text for a Methods section based on the study’s tasks, metrics, and instructions—with a single click—promoting transparency and reproducibility in creativity assessment. Designed for ease of use, CAP aims to democratize creativity measurement for researchers, educators, and everyone in between.more » « less
-
Abstract When performing time-intensive optimization tasks, such as those in topology or shape optimization, researchers have turned to machine-learned inverse design (ID) methods—i.e., predicting the optimized geometry from input conditions—to replace or warm start traditional optimizers. Such methods are often optimized to reduce the mean squared error (MSE) or binary cross entropy between the output and a training dataset of optimized designs. While convenient, we show that this choice may be myopic. Specifically, we compare two methods of optimizing the hyperparameters of easily reproducible machine learning models including random forest, k-nearest neighbors, and deconvolutional neural network model for predicting the three optimal topology problems. We show that under both direct inverse design and when warm starting further topology optimization, using MSE metrics to tune hyperparameters produces less performance models than directly evaluating the objective function, though both produce designs that are almost one order of magnitude better than using the common uniform initialization. We also illustrate how warm starting impacts both the convergence time, the type of solutions obtained during optimization, and the final designs. Overall, our initial results portend that researchers may need to revisit common choices for evaluating ID methods that subtly tradeoff factors in how an ID method will actually be used. We hope our open-source dataset and evaluation environment will spur additional research in those directions.more » « lessFree, publicly-accessible full text available February 1, 2026
-
Abstract Adjoint-based design optimizations are usually computationally expensive and those costs scale with resolution. To address this, researchers have proposed machine learning approaches for inverse design that can predict higher-resolution solutions from lower cost/resolution ones. Due to the recent success of diffusion models over traditional generative models, we extend the use of diffusion models for multi-resolution tasks by proposing the conditional cascaded diffusion model (cCDM). Compared to GANs, cCDM is more stable to train, and each diffusion model within the cCDM can be trained independently, thus each model’s parameters can be tuned separately to maximize the performance of the pipeline. Our study compares cCDM against a cGAN model with transfer learning. Our results demonstrate that the cCDM excels in capturing finer details, preserving volume fraction constraints, and minimizing compliance errors in multi-resolution tasks when a sufficient amount of high-resolution training data (more than 102 designs) is available. Furthermore, we explore the impact of training data size on the performance of both models. While both models show decreased performance with reduced high-resolution training data, the cCDM loses its superiority to the cGAN model with transfer learning when training data is limited (less than 102), and we show the break-even point for this transition. Also, we highlight that while the diffusion model may achieve better pixel-wise performance in both low-resolution and high-resolution scenarios, this does not necessarily guarantee that the model produces optimal compliance error or constraint satisfaction.more » « less
-
Free, publicly-accessible full text available April 22, 2026
-
Abstract Well-studied techniques that enhance diversity in early design concept generation require effective metrics for evaluating human-perceived similarity between ideas. Recent work suggests collecting triplet comparisons between designs directly from human raters and using those triplets to form an embedding where similarity is expressed as a Euclidean distance. While effective at modeling human-perceived similarity judgments, these methods are expensive and require a large number of triplets to be hand-labeled. However, what if there were a way to use AI to replicate the human similarity judgments captured in triplet embedding methods? In this paper, we explore the potential for pretrained Large Language Models (LLMs) to be used in this context. Using a dataset of crowdsourced text descriptions written about engineering design sketches, we generate LLM embeddings and compare them to an embedding created from human-provided triplets of those same sketches. From these embeddings, we can use Euclidean distances to describe areas where human perception and LLM perception disagree regarding design similarity. We then implement this same procedure but with descriptions written from a template that attempts to isolate a particular modality of a design (i.e., functions, behaviors, structures). By comparing the templated description embeddings to both the triplet-generated and pre-template LLM embeddings, we identify ways of describing designs such that LLM and human similarity perception better agree. We use these results to better understand how humans and LLMs interpret similarity in engineering designs.more » « less
-
Abstract In Topology Optimization (TO) and related engineering applications, physics-constrained simulations are often used to optimize candidate designs given some set of boundary conditions. However, such models are computationally expensive and do not guarantee convergence to a desired result, given the frequent non-convexity of the performance objective. Creating data-based approaches to warm-start these models — or even replace them entirely — has thus been a top priority for researchers in this area of engineering design. In this paper, we present a new dataset of two-dimensional heat sink designs optimized via Multiphysics Topology Optimization (MTO). Further, we propose an augmented Vector-Quantized GAN (VQGAN) that allows for effective MTO data compression within a discrete latent space, known as a codebook, while preserving high reconstruction quality. To concretely assess the benefits of the VQGAN quantization process, we conduct a latent analysis of its codebook as compared to the continuous latent space of a deep AutoEncoder (AE). We find that VQGAN can more effectively learn topological connections despite a high rate of data compression. Finally, we leverage the VQGAN codebook to train a small GPT-2 model, generating thermally performant heat sink designs within a fraction of the time taken by conventional optimization approaches. We show the transformer-based approach is more effective than using a Deep Convolutional GAN (DCGAN) due to its elimination of mode collapse issues, as well as better preservation of topological connections in MTO and similar applications.more » « less
-
Many design problems involve reasoning about points in high-dimensional space. A common strategy is to first embed these high-dimensional points into a low-dimensional latent space. We propose that a good embedding should be isometric—i.e., preserving the geodesic distance between points on the data manifold in the latent space. However, enforcing isometry is non-trivial for common neural embedding models such as autoencoders. Moreover, while theoretically appealing, it is unclear to what extent is enforcing isometry necessary for a given design analysis. This paper answers these questions by constructing an isometric embedding via an isometric autoencoder, which we employ to analyze an inverse airfoil design problem. Specifically, the paper describes how to train an isometric autoencoder and demonstrates its usefulness compared to non-isometric autoencoders on the UIUC airfoil dataset. Our ablation study illustrates that enforcing isometry is necessary for accurately discovering clusters through the latent space. We also show how isometric autoencoders can uncover pathologies in typical gradient-based shape optimization solvers through an analysis on the SU2-optimized airfoil dataset, wherein we find an over-reliance of the gradient solver on the angle of attack. Overall, this paper motivates the use of isometry constraints in neural embedding models, particularly in cases where researchers or designers intend to use distance-based analysis measures to analyze designs within the latent space. While this work focuses on airfoil design as an illustrative example, it applies to any domain where analyzing isometric design or data embeddings would be useful.more » « less
-
This paper introduces Least Volume—a simple yet effective regularization inspired by geometric intuition—that can reduce the necessary number of latent dimensions needed by an autoencoder without requiring any prior knowledge of the intrinsic dimensionality of the dataset. We show that the Lipschitz continuity of the decoder is the key to making it work, provide a proof that PCA is just a linear special case of it, and reveal that it has a similar PCA-like importance ordering effect when applied to nonlinear models. We demonstrate the intuition behind the regularization on some pedagogical toy problems, and its effectiveness on several benchmark problems, including MNIST, CIFAR-10 and CelebA.more » « less
An official website of the United States government

Full Text Available